107 research outputs found
Achieving Differential Privacy and Fairness in Machine Learning
Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in machine learning algorithms. On the other hand, there is fairness between the private model and the non-private model, i.e., how to ensure the utility loss due to differential privacy is the same towards each group.
The goal of this dissertation is to address challenging issues in privacy preserving and fairness-aware machine learning: achieving differential privacy with satisfactory utility and efficiency in complex and emerging tasks, using generative models to generate fair data and to assist fair classification, achieving both differential privacy and fairness simultaneously within the same model, and achieving equal utility loss w.r.t. each group between the private model and the non-private model.
In this dissertation, we develop the following algorithms to address the above challenges.
(1) We develop PrivPC and DPNE algorithms to achieve differential privacy in complex and emerging tasks of causal graph discovery and network embedding, respectively.
(2) We develop the fair generative adversarial neural networks framework and three algorithms (FairGAN, FairGAN+ and CFGAN) to achieve fair data generation and classification through generative models based on different association-based and causation-based fairness notions.
(3) We develop PFLR and PFLR* algorithms to simultaneously achieve both differential privacy and fairness in logistic regression.
(4) We develop a DPSGD-F algorithm to remove the disparate impact of differential privacy on model accuracy w.r.t. each group
Recommended from our members
Synthesis of 4-thio-5-(2′′-thienyl)uridine and cytotoxicity activity against colon cancer cells <i>in vitro</i>
A novel anti-tumor agent 4-thio-5-(2′′-thienyl)uridine (6) was synthesized and the in vitro cytotoxicity activity against mice colon cancer cells (MC-38) and human colon cancer cells (HT-29) was evaluated by MTT assay. The results showed that the novel compound had antiproliferative activity toward MC-38 and HT-29 cells in a dose-dependent manner. The cell cycle analysis by flow cytometry indicated that compound 6 exerted in tumor cell proliferation inhibition by arresting HT-29 cells in the G2/M phase. In addition, cell death detected by propidium iodide staining showed that compound 6 efficiently induced cell apoptosis in a concentration-dependent manner. Moreover, the sensitivity of human fibroblast cells to compound 6 was far lower than that of tumor cells, suggesting the specific anti-tumor effect of 4-thio-5-(2′′-thienyl)uridine. Taken together, novel compound 6 effectively inhibits colon cancer cell proliferation, and hence would have potential value in clinical application as an antitumor agent
Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment
Understanding mobile traffic patterns of large scale cellular towers in urban
environment is extremely valuable for Internet service providers, mobile users,
and government managers of modern metropolis. This paper aims at extracting and
modeling the traffic patterns of large scale towers deployed in a metropolitan
city. To achieve this goal, we need to address several challenges, including
lack of appropriate tools for processing large scale traffic measurement data,
unknown traffic patterns, as well as handling complicated factors of urban
ecology and human behaviors that affect traffic patterns. Our core contribution
is a powerful model which combines three dimensional information (time,
locations of towers, and traffic frequency spectrum) to extract and model the
traffic patterns of thousands of cellular towers. Our empirical analysis
reveals the following important observations. First, only five basic
time-domain traffic patterns exist among the 9,600 cellular towers. Second,
each of the extracted traffic pattern maps to one type of geographical
locations related to urban ecology, including residential area, business
district, transport, entertainment, and comprehensive area. Third, our
frequency-domain traffic spectrum analysis suggests that the traffic of any
tower among the 9,600 can be constructed using a linear combination of four
primary components corresponding to human activity behaviors. We believe that
the proposed traffic patterns extraction and modeling methodology, combined
with the empirical analysis on the mobile traffic, pave the way toward a deep
understanding of the traffic patterns of large scale cellular towers in modern
metropolis.Comment: To appear at IMC 201
Fine-grained Anomaly Detection in Sequential Data via Counterfactual Explanations
Anomaly detection in sequential data has been studied for a long time because
of its potential in various applications, such as detecting abnormal system
behaviors from log data. Although many approaches can achieve good performance
on anomalous sequence detection, how to identify the anomalous entries in
sequences is still challenging due to a lack of information at the entry-level.
In this work, we propose a novel framework called CFDet for fine-grained
anomalous entry detection. CFDet leverages the idea of interpretable machine
learning. Given a sequence that is detected as anomalous, we can consider
anomalous entry detection as an interpretable machine learning task because
identifying anomalous entries in the sequence is to provide an interpretation
to the detection result. We make use of the deep support vector data
description (Deep SVDD) approach to detect anomalous sequences and propose a
novel counterfactual interpretation-based approach to identify anomalous
entries in the sequences. Experimental results on three datasets show that
CFDet can correctly detect anomalous entries
IF2Net: Innately Forgetting-Free Networks for Continual Learning
Continual learning can incrementally absorb new concepts without interfering
with previously learned knowledge. Motivated by the characteristics of neural
networks, in which information is stored in weights on connections, we
investigated how to design an Innately Forgetting-Free Network (IF2Net) for
continual learning context. This study proposed a straightforward yet effective
learning paradigm by ingeniously keeping the weights relative to each seen task
untouched before and after learning a new task. We first presented the novel
representation-level learning on task sequences with random weights. This
technique refers to tweaking the drifted representations caused by
randomization back to their separate task-optimal working states, but the
involved weights are frozen and reused (opposite to well-known layer-wise
updates of weights). Then, sequential decision-making without forgetting can be
achieved by projecting the output weight updates into the parsimonious
orthogonal space, making the adaptations not disturb old knowledge while
maintaining model plasticity. IF2Net allows a single network to inherently
learn unlimited mapping rules without telling task identities at test time by
integrating the respective strengths of randomization and orthogonalization. We
validated the effectiveness of our approach in the extensive theoretical
analysis and empirical study.Comment: 16 pages, 8 figures. Under revie
IoT vs. Human: A Comparison of Mobility
Internet of Thing (IoT) devices are rapidly becoming an indispensable part of our life with their increasing deployment in many promising areas, including tele-health, smart city, intelligent agriculture. Understanding the mobility of IoT devices is essential to improve quality of service in IoT applications, such as route planning in logistic management, infrastructure deployment, cellular network update and congestion detection in intelligent traffic. Despite its importance, there are not many results pertaining to the mobility of IoT devices. In this article, we aim to answer three research questions: (i) what are the mobility patterns of IoT device? (ii) what are the differences between IoT device and smartphone mobility patterns? (iii) how the IoT device mobility patterns differ among device types and usage scenarios? We present a comprehensive characterization of IoT device mobility patterns from the perspective of cellular data networks, using a 36-days long signal trace, including 1.5 million IoT devices and 0.425 million smartphones, collected from a nation-wide cellular network in China. We first investigate the basic patterns of IoT devices from two perspectives: temporal and spatial characteristics. Our study finds that IoT device mobility exhibits significantly different patterns compared with smartphones in multiple aspects. For instance, IoT devices move more frequently and have larger radius of gyration. Then we explore the essential mobility of IoT devices by utilizing two models that reveal the nature of human mobility, i.e., exploration and preferential return (EPR) model and entropy based predictability model. We find that IoT devices, with few exceptions, behave totally different from human, and we further derive a new formulation to describe their movement. We also find the gap mobility predictability and predictability limit between IoT and human is not as big as people expected.Peer reviewe
How enlightened self-interest guided global vaccine sharing benefits all: a modelling study
Background: Despite the consensus that vaccines play an important role in
combating the global spread of infectious diseases, vaccine inequity is still
rampant with deep-seated mentality of self-priority. This study aims to
evaluate the existence and possible outcomes of a more equitable global vaccine
distribution and explore a concrete incentive mechanism that promotes vaccine
equity. Methods: We design a metapopulation epidemiological model that
simultaneously considers global vaccine distribution and human mobility, which
is then calibrated by the number of infections and real-world vaccination
records during COVID-19 pandemic from March 2020 to July 2021. We explore the
possibility of the enlightened self-interest incentive mechanism, i.e.,
improving one's own epidemic outcomes by sharing vaccines with other countries,
by evaluating the number of infections and deaths under various vaccine sharing
strategies using the proposed model. To understand how these strategies affect
the national interests, we distinguish the imported and local cases for further
cost-benefit analyses that rationalize the enlightened self-interest incentive
mechanism behind vaccine sharing. ...Comment: Accepted by Journal of Global Healt
Recommended from our members
The Synthesis of (E)-4-Thio-5-(2-Bromovinyl)Uridine/Deoxyuridine and Its Characterization and Cytotoxicity
(E)-4-Thio-5-(2-brominevinyl)uridine/2'-deoxyuridine(8a/8b) were efficiently and in an environmental friendly way synthesized from uridine/2'-deoxyuridine (1a/1b) that were first transformed to (E)-(2-brominevinyl) uridine / 2'-deoxyuridine(5a/5b) via iodination, selective oxidation, Heck reaction steps. The resulting products (5a/5b) were then converted to the targets (8a/8b) through esterification, thio-reaction of carbonyl, hydrolysis steps. Two new compounds (8a/8b) and three new intermediates (7a 7b 10) were obtained, and their structures have been fully characterized by 1H NMR, 13C NMR, IR, UV, HR-MS, X-Ray. The study of 8a and their derivatives regarding cytotoxicity was carried out by using MTT experiment method, and the initial findings suggest (E)-4-Thio-5-(2-brominevinyl) uridine/ 2'-deoxyuridine (8a / 8b) would be potential antitumor drugs
- …